Mitigating I/O latency in SSD-based graph traversal
نویسندگان
چکیده
Mining large graphs has now become an important aspect of many applications. Recent interest in low cost graph traversal on single machines has lead to the construction of systems that use solid state drives (SSDs) to store the graph. An SSD can be accessed with far lower latency than magnetic media, while remaining cheaper than main memory. Unfortunately SSDs are slower than main memory and algorithms running on such systems are hampered by large IO latencies when accessing the SSD. In this paper we present two novel techniques to reduce the impact of SSD IO latency on semi-external memory graph traversal. We introduce a variant of the Compressed Sparse Row (CSR) format that we call Compressed Enumerated Encoded Sparse Offset Row (CEESOR). CEESOR is particularly efficient for graphs with hierarchical structure and can reduce the space required to represent connectivity information by amounts varying from 5% to as much as 76%. CEESOR allows a larger number of edges to be moved for each unit of IO transfer from the SSD to main memory and more effective use of operating system caches. Our second contribution is a runtime prefetching technique that exploits the ability of solid state drives to service multiple random access requests in parallel. We present a novel Run Along SSD Prefetcher (RASP). RASP is capable of hiding the effect of IO latency in single threaded graph traversal in breadth-first and shorted path order to the extent that it improves iteration time for large graphs by amounts varying from 2.6X-6X.
منابع مشابه
Unblinding the OS to Optimize User-Perceived Flash SSD Latency
In this paper, we present a flash solid-state drive (SSD) optimization that provides hints of SSD internal behaviors, such as device I/O time and buffer activities, to the OS in order to mitigate the impact of I/O completion scheduling delays. The hints enable the OS to make reliable latency predictions of each I/O request so that the OS can make accurate scheduling decisions when to yield or b...
متن کاملUsing Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Social networks often require the ability to perform low latency graph computations in the user request path. For example, at LinkedIn, we show the graph distance and common connections when we show a profile in any context on the site. To do this, we have developed a distributed and partitioned graph system that scales to hundreds of millions of members and their connections, handling hundreds...
متن کاملExtending SSD Lifetimes with Disk-Based Write Caches
We present Griffin, a hybrid storage device that uses a hard disk drive (HDD) as a write cache for a Solid State Device (SSD). Griffin is motivated by two observations: First, HDDs can match the sequential write bandwidth of mid-range SSDs. Second, both server and desktop workloads contain a significant fraction of block overwrites. By maintaining a log-structured HDD cache and migrating cached...
متن کاملLightNVM: The Linux Open-Channel SSD Subsystem
As Solid-State Drives (SSDs) become commonplace in data-centers and storage arrays, there is a growing demand for predictable latency. Traditional SSDs, serving block I/Os, fail to meet this demand. They offer a high-level of abstraction at the cost of unpredictable performance and suboptimal resource utilization. We propose that SSD management trade-offs should be handled through Open-Channel ...
متن کاملPointer Reduction Techniques for Minimising Memory Usage, I/O Bandwidth and Computational Effort in BDD Applications
BDDs (Binary Decision Diagrams) are often used to represent Boolean expressions in hardware synthesis, hardware and software verification and numerous other applications. BDD computation, implemented using tree data structures with binary nodes, is inherently memory intensive, and therefore suffers from the von Neumann memory bottleneck. This thesis examines an approach which can speed up BDD c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012